fix: redesign attempt_number semantics by alexluong · Pull Request #781 · hookdeck/outpost

alexluong · 2026-03-23T09:33:20Z

Summary

Manual retries derive attempt_number from logstore (was hardcoded 0)
Manual retry failures interact with the retry schedule: the scheduler has upsert semantics (same RetryID overwrites), so a failed manual retry atomically replaces any pending auto retry with the next backoff tier
Budget exhaustion on manual retry cancels any lingering scheduled retry

Test plan

Unit tests pass (internal/deliverymq, internal/apirouter, internal/models)
E2E tests pass (cmd/e2e)
New TestE2E_ManualRetryScheduleInteraction validates manual retry timing and scheduler logic

Closes #662

🤖 Generated with Claude Code

vercel · 2026-03-23T09:33:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
outpost-docs	Ready	Preview, Comment	Mar 23, 2026 5:54pm
outpost-website	Ready	Preview, Comment	Mar 23, 2026 5:54pm

…ing (#662) - Make attempt_number 1-indexed (was 0-indexed) across all delivery paths - Manual retries derive attempt_number from logstore instead of hardcoding - Manual retry failures now interact with retry schedule (cancel + reschedule) - Budget exhaustion on manual retry cancels any lingering scheduled retry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace call-recording assertions with stateful map that mirrors real scheduler upsert/delete semantics. Tests now seed initial retry state and assert the resulting state after handler execution. Use ScheduledBackoff so each tier has a distinct delay, proving the schedule advances to the correct tier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexbouchardd · 2026-03-23T14:05:25Z

internal/apirouter/retry_handlers.go

+	// 3. Derive attempt number from existing attempts
+	attemptResp, err := h.logStore.ListAttempt(c.Request.Context(), logstore.ListAttemptRequest{
+		TenantIDs:      []string{event.TenantID},
+		EventIDs:       []string{req.EventID},
+		DestinationIDs: []string{req.DestinationID},
+		Limit:          1,
+		SortOrder:      "desc",
+	})
+	if err != nil {
+		AbortWithError(c, http.StatusInternalServerError, NewErrInternalServer(err))
+		return
+	}
+	attemptNumber := 1
+	if len(attemptResp.Data) > 0 {
+		attemptNumber = attemptResp.Data[0].Attempt.AttemptNumber + 1
+	}


Although I don't have a better option, this is potentially problematic. Data in the log store is not atomic and can take time to be retrievable. It's possible to end up with duplicates (attempt number) and the performance of the query is likely not very good on CH.

Perharps that's acceptable tradeoff but I wonder if there's a better approach we haven't considered yet

Hmm I don't see an alternative for deducing attempt_number based on available data. The only concern I have with the data is there's a 10-30s lag between the delivery and when the log data is persisted (batching). Otherwise, this query should be sufficient.

Perf is always something to consider because you can never retrieve a single item with CH but I think it's a tradeoff. Actually maybe we can use this attempt data and skip querying the Event, so it should theoretically be the same as before. It's also in the retry flow which I think is not as critical compared to the initial publish flow?

Retry executor now queries logstore for the latest attempt instead of carrying a stale attempt number in the RSMQ message. Idempotency key changed to event_id:destination_id:attempt_number so manual and auto retries with the same attempt number are deduplicated (race protection). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…t number ListAttempt already returns the associated Event on each AttemptRecord. Replace the two-query pattern (RetrieveEvent + ListAttempt) with a single ListAttempt call in both the retry executor and the API retry handler. Remove RetrieveEvent from RetryEventGetter interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

EnsureLocalStack/EnsureRabbitMQ/EnsureGCP were called before testinfra.Start(t) had a chance to skip, causing container startups in CI even with -short. Add testutil.CheckIntegrationTest(t) at the top of each test function so the skip runs first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexluong changed the title ~~fix: redesign attempt_number semantics (#662)~~ fix: redesign attempt_number semantics Mar 23, 2026

vercel bot deployed to Preview – outpost-docs March 23, 2026 09:33 View deployment

vercel bot deployed to Preview – outpost-website March 23, 2026 09:33 View deployment

alexluong force-pushed the 662-attempt-number-semantics branch from be5a99d to 57b243e Compare March 23, 2026 09:38

vercel bot deployed to Preview – outpost-docs March 23, 2026 09:39 View deployment

vercel bot deployed to Preview – outpost-website March 23, 2026 09:39 View deployment

alexluong force-pushed the 662-attempt-number-semantics branch from 57b243e to 131a85f Compare March 23, 2026 09:41

vercel bot deployed to Preview – outpost-docs March 23, 2026 09:42 View deployment

vercel bot deployed to Preview – outpost-website March 23, 2026 09:42 View deployment

alexbouchardd approved these changes Mar 23, 2026

View reviewed changes

alexluong and others added 2 commits March 23, 2026 21:15

vercel bot deployed to Preview – outpost-website March 23, 2026 17:28 View deployment

vercel bot deployed to Preview – outpost-docs March 23, 2026 17:28 View deployment

vercel bot deployed to Preview – outpost-docs March 23, 2026 17:54 View deployment

vercel bot deployed to Preview – outpost-website March 23, 2026 17:54 View deployment

alexluong merged commit 59d99df into main Mar 23, 2026
4 checks passed

alexluong deleted the 662-attempt-number-semantics branch March 23, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: redesign attempt_number semantics#781

fix: redesign attempt_number semantics#781
alexluong merged 5 commits intomainfrom
662-attempt-number-semantics

alexluong commented Mar 23, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

alexbouchardd Mar 23, 2026

Uh oh!

alexluong Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexluong commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbouchardd Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

alexluong Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexluong commented Mar 23, 2026 •

edited

Loading

vercel bot commented Mar 23, 2026 •

edited

Loading

alexluong Mar 23, 2026 •

edited

Loading